Skip to content

Conversation

@PhilKes
Copy link

@PhilKes PhilKes commented Jan 14, 2023

This PR Closes #91878 by adding the write_load_forecast field to the DataStreamsStats which is returned by GET /_data_streams/_stats

@elasticsearchmachine elasticsearchmachine added v8.7.0 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels Jan 14, 2023
@PhilKes
Copy link
Author

PhilKes commented Jan 14, 2023

@dakrone could you please review this PR?

return new DataStreamsStatsAction.DataStreamShardStats(indexShard.routingEntry(), storeStats, maxTimestamp);
var indexMetadata = indexService.getMetadata();
assert indexMetadata != null;
double writeLoadForecast = writeLoadForecaster.getForecastedWriteLoad(indexMetadata).orElse(0.0);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not entirely sure if this usage of the WriteLoadForecaster is correct, can somebody verify that?

Copy link
Contributor

@DaveCTurner DaveCTurner Feb 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage is ok except:

  • we should preserve empty rather than mapping it to 0. The difference between "write load is zero" and "write load is missing" is important.

  • there's no need to do this for each shard. Instead we should extract the write load from the index metadata in getResponseFactory.

  • we only want the write load from the write index of the data stream

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB edited^

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DaveCTurner for your feedback, I fixed the things you mentioned, writeLoadForecast is now Nullable and extracted only from the write index of the data stream

@pxsalehi pxsalehi added :Data Management/Data streams Data streams and their lifecycles and removed needs:triage Requires assignment of a team area label labels Jan 17, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @PhilKes, I left a few more comments. Also I don't think this will even pass ./gradlew precommit as it stands, make sure you run that and fix up the problems it raises.

.index(indexAbstraction.getWriteIndex())
.getForecastedWriteLoad();
if (writeLoadForeCast.isPresent()) {
stats.writeLoadForecast = writeLoadForeCast.getAsDouble();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks better, thanks! However, this is now just the load for one shard of the write index. I think we want to multiply this by the number of shards in that index too.

Actually I can see value in knowing both the per-shard and per-index numbers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I added the multiplication with getNumberOfShards().
Do you think I should add the per-shard write load too? Having writeLoadForecast and writeLoadForecastPerShard?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think I should add the per-shard write load too?

Yes please (let's make both of the names specific: ...PerShard and ...PerIndex)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think I should add the per-shard write load too?

Yes please (let's make both of the names specific: ...PerShard and ...PerIndex)

Done

if (indexAbstraction.getType() == IndexAbstraction.Type.DATA_STREAM) {
IndexAbstraction.DataStream dataStream = (IndexAbstraction.DataStream) indexAbstraction;
AggregatedStats stats = aggregatedDataStreamsStats.computeIfAbsent(dataStream.getName(), s -> new AggregatedStats());
OptionalDouble writeLoadForeCast = clusterState.getMetadata()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
OptionalDouble writeLoadForeCast = clusterState.getMetadata()
OptionalDouble writeLoadForecast = clusterState.getMetadata()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed spelling error

AggregatedStats stats = aggregatedDataStreamsStats.computeIfAbsent(dataStream.getName(), s -> new AggregatedStats());
OptionalDouble writeLoadForeCast = clusterState.getMetadata()
.index(indexAbstraction.getWriteIndex())
.getForecastedWriteLoad();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might this throw a NullPointerException sometimes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats true, getWriteIndex() can return null, I added a null-check, and also precommit told me to use WriteLoadForecaster to get the writeload instead of IndexMetadata.getForecastedWriteLoad().

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks good to me. A few more non-implementation tasks needed:

  • I'd like there to be a test like the ones in DataStreamsStatsTests where they aren't null (for which I think you will need to make a test-specific plugin which overrides ClusterPlugin#createWriteLoadForecasters).

  • These new fields need mentioning in docs/reference/indices/data-stream-stats.asciidoc

  • Similarly it would be good to have some mention of them in modules/data-streams/src/yamlRestTest/resources/rest-api-spec/test/data_stream/120_data_streams_stats.yml, just to say that they really do exist in the output (null is fine here).

@PhilKes
Copy link
Author

PhilKes commented Feb 15, 2023

Implementation looks good to me. A few more non-implementation tasks needed:

* [ ]  I'd like there to be a test like the ones in `DataStreamsStatsTests` where they aren't `null` (for which I think you will need to make a test-specific plugin which overrides `ClusterPlugin#createWriteLoadForecasters`).

* [ ]  These new fields need mentioning in `docs/reference/indices/data-stream-stats.asciidoc`

* [ ]  Similarly it would be good to have some mention of them in `modules/data-streams/src/yamlRestTest/resources/rest-api-spec/test/data_stream/120_data_streams_stats.yml`, just to say that they really do exist in the output (null is fine here).

Thank you @DaveCTurner, but I need some guidance on this. I added the fields to the data-stream-stats.asciidoc, Im not quite sure when exactly the forecasted write load can be null, therefore I am not sure if a note on those fields is neccessary and what it should entail.

I added the fields to the 120_data_streams_stats.yml.

For the actual Test I tried to implement a new Test class DataStreamsStatsWriteLoadTests with the same setup as DataStreamsStatsTests, but with an additional new ClusterPlugin StaticWriteLoadForecasterPlugin that simply implements WriteLoadForecaster by returning 0.25 for any getForecastedWriteLoad call. I now wanted to implement a basic test like testStatsExistingDataStream, creating a random DataStream but including how many shards it should have (to actually test the perShard/perIndex values). But I am not sure how to control how many shards the DataStream/Write-Index is created with, can you help me on this?

EDIT: Ok I found out I can set the number of shards with the settings when creating the Template

new Template(Settings.builder().put(IndexMetadata.SETTING_NUMBER_OF_SHARDS, shards).build(),

I guess I will simply add a Test case to the DataStreamsStatsTests instead of creating a new Test class.

@DaveCTurner
Copy link
Contributor

Im not quite sure when exactly the forecasted write load can be null, therefore I am not sure if a note on those fields is neccessary and what it should entail.

I'm ok not to mention this in the docs.

Ok I found out I can set the number of shards with the settings when creating the Template

Yes that sounds right.

@PhilKes
Copy link
Author

PhilKes commented Feb 15, 2023

Im not quite sure when exactly the forecasted write load can be null, therefore I am not sure if a note on those fields is neccessary and what it should entail.

I'm ok not to mention this in the docs.

Ok I found out I can set the number of shards with the settings when creating the Template

Yes that sounds right.

Ok so I added the testWriteLoadStatsExistingDataStream to DataStreamsStatsTests with a datastream with 3 shards and asserting the per_shard and per_index forecast accordingly.

@PhilKes PhilKes requested a review from DaveCTurner March 15, 2023 11:28
# Conflicts:
#	modules/data-streams/src/main/java/org/elasticsearch/datastreams/action/DataStreamsStatsTransportAction.java
#	modules/data-streams/src/test/java/org/elasticsearch/datastreams/DataStreamsStatsTests.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/Data streams Data streams and their lifecycles external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Data Management Meta label for data/management team v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

write_load stats should be part of the _data_stream/stats